I. INTRODUCTION:

Ranking, reputation, recommendation, and trust systems have become essential ingredients of web-based multi-agent

systems (e.g. [16, 22, 8, 25, 11]). These systems aggregate agents’ reviews of products, services, and each other, into valuable information. Notable commercial examples include

Amazon and E-Bay’s recommendation and reputation systems (e.g. [21]), Google’s page ranking system [19], and the Epinions web of trust/reputation system (e.g. [18]). This paper is concerned particularly with the setting where there is a single item of interest (e.g., a product, service, or political candidate). A subset of the agents have prior opinions about this item. Any of the remaining agents might desire to estimate whether or not they would like the item, based on the opinions of others. In the offline world, a person might first consult her friends for their recommendations.

In turn, the friends, if they do not have opinions of their own, may consult their friends, and so on. Based on the cumulative feedback the initial consulter receives, she might form her own subjective opinion. An automated trust-based recommendation system aims to provide a similar process to produce high-quality personalized recommendations for agents.

We model this setting as an annotated directed graph in which some of the nodes are labeled by votes of + and −. Here a node represents an agent, and an edge directed from

a to b represents the fact that agent a trusts agent b. A subset of the nodes are labeled by + or − votes, indicating that these nodes have already formed opinions about the item under question. Based on this input, a recommendation system must output a recommendation for each unlabeled node. We call such an abstraction a voting network because it models a variety of two-candidate voting systems, where the candidates are + and −. For an example, consider a directed star graph where a single root node points to n agents with labels, which models a committee making a recommendation to the root node. In that setting, majority and consensus are two common voting rules. For another example, The U.S. presidential voting system can be modeled as a more complicated digraph, where the root points to nodes representing the members of the Electoral College, and the Electoral College nodes point to nodes representing the voters in the state or congressional district that they represent.

A multitude of recommendation systems have been pro- posed and implemented, and many fit in to the network- based framework described above. This raises the question of how to determine the relative merits of alternative approaches to providing trust-based recommendations. The task of comparing recommendation systems is complicated by the difficulty of producing an objective measure of recommendation quality. We begin with an impossibility theorem: for a small, natural set of axioms, there is no recommendation system simultaneously consistent with all axioms in the set. However, for any proper subset of the axioms there exists a recommendation system that satisfies all axioms in the subset. We consider two ways past this negative result, both by replacing the transitivity axiom (defined below). We prove that there are recommendation systems consistent with both new sets of axioms. We also show that when one of these new sets is augmented with an additional axiom, the resulting set of axioms is uniquely satisfied by a recommendation system based on random walks.

We also consider the descriptive approach, in which we characterize existing (acyclic) systems, like simple commit- tees and the U.S. presidential elections, by a simple major- ity axiom. We generalize this to an axiom that leads to a unique “minimum cut” system on general undirected (possibly cyclic) graphs. Prior work on personalized ranking systems has produced a ranking system called personalized PageRank [13]. This system can be translated into a recommendation system by augmenting it to handle votes, creating a natural hybrid of a ranking and recommendation system. The details are discussed in Section 3.4. We define a notion of incentive compatibility for recommendation systems. This is important when designing systems for deployment in monetized settings, because, as experience has shown, self-interested agents will not respect the rules of the system when there is money to be made by doing otherwise. We find that all of the recommendation systems for which we provide a characterizing set of axioms turn out to be incentive compatible, including the random walk system, majority of majorities system, and minimum cut system. In contrast, the personalized PageRank system and various other natural systems are not incentive compatible.

For simplicity, this paper focuses on the case of unweighted

multigraphs and binary votes. Most of our observations carry over to the more general cases of weighted graphs and fractional votes. The rest of this paper is organized is follows. The next section contains related work. Section 1.2 is a brief summary of our axioms, algorithms, and results. Some formal definitions and notations appear in Section 2, and in Section 3 we present some recommendation systems. In Section 4 we formally define our axioms. Section 5 provides the outline of the proofs of our results. Section 6 shows that our systems are incentive compatibility.

1.1 Related Work

There are several ways to study recommendation systems. Evaluation tools include simulations and field experiments (e.g. [7, 21, 15]). In addition, one may also consider computational properties of suggested .Our work builds on previous work on axiomatizations of ranking systems. The literature on the axiomatic approach to ranking systems deals with both global ranking systems [1, 2, 24, 9, 25, 4, 5] and personalized ranking systems [7, 10, 3, 17]. Personalized ranking systems are very close to trust based recommendation systems. In such systems, agents rank some of the other agents. Then an aggregated ranking of agents, personalized to the perspective of a particular agent, is generated based on that information. However, previous studies on the axiomatic approach have not been concerned with situations where the participants share reviews or opinions on items of interest which are not the other agents in the system. Many existing recommendation system are based on collaborative filtering (CF), which is a completely different approach than the trust-based systems considered in this paper. An axiomatic approach to analyzing CF systems appears in [20]. Combining trust-based and CF approaches is a direction of current research [23].

1.2 Overview of Results

We model a voting network by a partially labeled directed multigraph whose nodes represent agents. A subset of the nodes, called voters, is labeled with votes of + and −. The remaining nodes are nonvoters. The recommendation sys- tem must assign, to each source nonvoter, a recommendation in {−, 0,+}.1 Below we informally summarize our axioms. Many of them are illustrated in Figure 1. We caution that our aim is only to succinctly convey the spirit of the axioms; formal definitions are found in Section 2.1.

1. Symmetry. Isomorphic graphs result in corresponding isomorphic recommendations (anonymity), and the system is also symmetric with respect to + and – votes (neutrality).

2. Positive response. If a node’s recommendation is 0 and an edge is added to a + voter, then the former’s recommendation becomes +.

3. Independence of Irrelevant Stuff (IIS). A node’s recommendation is independent of agents not reach- able from that node. Recommendations are also independent of edges leaving voters.

4. Neighborhood consensus. If a nonvoter’s neighbors unanimously vote +, then the recommendation of other nodes will remain unchanged if that node in-stead becomes a + voter. If, in a particular graph, a source node is recommended +, then we say that the source trusts the set of agents that voted + more than those that voted −. The following axiom states that this relation should be transitive, as we imagine varying the votes of various subsets of agents.

5. Transitivity. For any graph (N,E) and disjoint sets A,B,C ⊆ N, for any source s, if s trusts A more than B, and s trusts B more than C, then s trusts A more than C.

Figure 1: Illustrative voting networks. Labels ± indicate votes, not recommendations. a) IIS: Node s’ s recommendation does not depend on any of the dashed node or dashed edges, since we ignore unreachable nodes as well as edges out of voters. b) Neighborhood consensus: Node v can be assigned + vote. c) If the Recommendation for s is + then we say s trusts {v} more than {u}. d) Trust propagation: The dashed edges the upper part can be removed and replaced by dashed edges in the lower part. e) Scale invariance: Edges leaving s are tripled without consequence. f) No groupthink: The three nonvoting nodes cannot all be given + recommendations.

Theorem 1. Axioms 1-5 are inconsistent. Any proper subset of these axioms is satisfied by some recommendation system. Instead of transitivity, we consider the following two axioms. Similar axioms have been used in the axiomatization of PageRank in the context of ranking systems [2].

6. Trust propagation. Suppose node u trusts nonvoter v. Suppose that u has k edges to v and v has k edges to other nodes. Then the edges from u to v can be replaced directly by edges from u to the nodes that v trusts without changing any resulting recommendations.

7. Scale invariance. Duplicating the outgoing edges of a node does not change recommendations. We specify a random walk algorithm, a (deterministic) algorithm that computes a recommendation for node s by considering a (hypothetical) random walk in the directed graph that starts at node s and follows outgoing edges, uniformly at random, until the first voter is reached. Its recommendation for s is based on whether it is more likely that the first voter reached votes + or −. The system computed by the random walk algorithm is called the Random Walk recommendation system (RW).

Theorem 2. Axioms 1-4 and 6-7 are satisfied uniquely by RW.

As we will see in the proof of Theorem 1, the transitivity axiom and the IIS axiom are hard to reconcile because IIS implies the edges out of the voting nodes do not matter while transitivity implies that the sets in the trust graph must obey a certain relation regardless of who is voting. A weaker version of transitivity which does not conflict with IIS is the following: 5′. Sink Transitivity. For any graph (N,E) and any disjoint sets A,B,C ⊆ N for which A,B,C contain only vertices with out-degree 0, for any source s, if s truss A more than B and s trusts B more than C, then s trusts A more than C.

Theorem 3. Axioms 1-4 and 5′ are satisfied by RW. We also consider axioms which lead uniquely to known recommendation systems:

8. Majority. The recommendation of a node should be equal to the majority of the votes and recommendations of its trusted neighbors.

9. No groupthink. Suppose a set of nonvoters unanimously have the same nonzero recommendation. Then their recommendation should equal the majority of their trusted (external) neighbors’ votes and recommendations.

The majority axiom represents a reasonable semantics — an

agent might like to wait for its trusted neighbors to receive a

recommendation and then take a simple majority. However, this axiom alone still permits a large clique of nonvoters to all have positive recommendations when they only point to external agents with negative recommendations (see Figure 1f).

The no-groupthink axiom is a natural extension to larger sets. It implies the majority axiom when one considers just singleton sets. Unfortunately, on general directed graphs axiom 9 is inconsistent. However, it is a statement about a single graph G, so we can consider it on limited classes of graphs. Two interesting classes of graphs are directed acyclic graphs and undirected graphs, where axioms 8 and 9 lead uniquely to two interesting solutions. These are the majority-of-majorities and minimum-cut systems are defined in Section 3.Theorem 4. (a) Axiom 8 on a rooted DAG implies the majority-of-majorities system. (b) Axiom 9 on an undirected graph implies the min-cut recommendation system.

2. NOTATION AND DEFINITIONS:

Following the motivation provided in the previous section, we now formally define the basic setting of a trust-based recommendation system. In the remainder of the paper, we refer to such systems simply as recommendation systems, for brevity.

Definition 1. A voting network is a directed annotated multigraph G = (N, V+, V−,E) where N is a set of nodes, V+, V− ⊆ N are disjoint subsets of positive and negative voters, and E ⊆ N2 is a multiset of edges with parallel edges allowed but no self-loops.

When V+ and V− are clear from context, we denote the set of voters by V = V+ ∪ V− and the set of nonvoters by V = N \ V.

Definition 2. A recommendation system R takes as input a voting network G and source s ∈ V and outputs recommendation R(G, s) ∈ {−, 0,+}.

For convenience, we will use R+(G),R−(G), and R0(G) to denote the set of sources to which R gives a particular recommendation, i.e. R+ = {s ∈ V | R(G, s) = +}. Also, we define R(G) = hR+(G),R−(G),R0(G)i.

We denote by sgn : R → {−, 0,+} the function that computes the sign of its input. We denote by PredE(v) and SuccE(v) the multisets of nodes that point to v and that v points to, respectively (where, for example, u appears in PredE(v) with multiplicity equal to the number of arcs (u, v) in multiset E). Given a multiset of recommendations, S ⊆ {−, 0,+}, we define the majority MAJ(S) to be: + if more than half the elements of S are +; − if more than half of S are −; and 0 otherwise.

3. SYSTEMS AND ALGORITHMS:

3.1 Random Walk System (RW)

We first describe a recommendation system based on random walks. Given a voting network G = (N, V+, V−,E) and a source s ∈ V , the random walk system simulates the following: start a walker at node s and, at each step, choose an outgoing edge uniformly at random and follow it to the destination node. Continue this random walk until a node with a ± vote is reached, or until a node with no outgoing edges is reached (note this walk may reach a node whose strongly connected component contains no voters, and thus never terminate). Let ps be the probability that the random walk terminates at a node with positive vote and qs be the probability that the random walk terminates at node with negative vote. Let rs = ps − qs. The random walk recommendation system recommends sgn (rs) to s.

The algorithm given in Figure 2 correctly computes recommendations as defined by the RW system. To see this, first note that, for any node that cannot reach a voter, the recommendation must be 0 in both the RW system and the RW algorithm. Next, probabilistic arguments show that the equations defined must be obeyed by the random walk. Finally, the uniqueness of the solution follows from the fact that, with probability 1, the random walk will reach a vertex in S ∪ V+ ∪ V−. Also note that the algorithm can be implemented efficiently and in polynomial time.

As stated in Theorem 2 (Section 1.2), the RW recommendation system is the unique system which satisfies axioms 1-4 and 6-7. The formal definitions of these axioms appear in Section 4 and the proof of Theorem 2 is sketched in Sec- tion 5 and provided in full in Appendix B. Theorem 3 states that RW satisfies 1-4 and 5′. The proof of this is omitted

Input: G = (N, V+, V−,E), s ∈ V .

Output: recommendation ∈ {−, 0,+}.

1. Let S ⊆ V be the set of nonvoters that cannot reach

any voter.

2. For each v ∈ N, create a variable rv ∈ R. Solve the

following from rv:

rv =

3. Output sgn(rs).

3.2 Majority-of-Majorities (MoM)

The system in this section and in the next seems quit different at first glance, but both derive from the same single axiom. In this section, we present a system that is well defined only when the graph underlying the voting network is a Directed Acyclic Graph (DAG). Nodes in a finite DAG can always be partitioned into a finite number of levels. In level 0, nodes have out-degree 0. In each level i ≥ 1, nodes have edges only to nodes in level j < i. The Majority-of-majorities system assigns a recommendation as follows: each nonvoter that is a sink (i.e. in level 0) receives recommendation 0; each voter in level I receive a recommendation equal to the majority of the recommendations and votes of its outgoing neighbors (where multiple edges are counted multiple times). This can be computed recursively by an efficient algorithm. Recall that we use the definition of Majority from Section 2, which is conservative in the sense that in order to have a non-zero recommendation, there must be a strict majority of matchng recommendations. As stated in Theorem 4a, the majority-of-majorities recommendation system is the unique system which satisfies axiom 8 when the voting network is a DAG. The proof of Theorem 4 is sketched in Section 5 and provided in full in Appendix C.

3.3 Minimum Cut System (min-cut)

Let G = (N, V+, V−,E) be a voting network. Let E′ ⊆ E be the set of edges in E that originate at nonvoters, i.e., eliminate edges out of voters. We say that cut C ⊆ E′ is a

(V+, V−)-cut of G if there is no path from V+ to V− using edges in E′ \ C. We say that C is a min-cut of G if its size

|C| is minimal among all (V+, V−)-cuts. The minimum cut system can be defined as follows. The recommendation of a source s is + if in all min-cuts there is a path from s to V+ among edges in E′ \ C. The recommendation is − if all min-cuts leave a path from s to V−. Otherwise, the recommendation is 0. This may be computed as follows: first, compute a min- cut C in G. Next, consider network G+ which is formed from G by adding an edge from source s to a + voter, and compute C+ in G+. If |C| < |C+| then the recommendation is −. Otherwise, consider network G−, formed from G by adding an edge from s to a − voter. For a min-cut C− of G−, if |C| < |C−| then the recommendation for s is +. Other- wise, the recommendation for s is 0. This can be computed in polynomial time by repeatedly using a polynomial-time algorithm for (s, t)-minimum cut. To see that it correctly computes the min-cut recommendation, note that if s is connected to V+ in all min-cuts then adding an edge from s to a − voter will create a path from V− to V+ in all min-cuts. This will increase the min-cut cost by 1. On the other hand, if there is a cut of G where s is not connected to V+, then this edge set will still be a cut in G− so the min-cut cost will not increase. Similarly, |C| < |C+| if and only if s is connected to V− in all min-cuts of G. In the remaining case, the sizes of all three min-cuts will be the same because there are some min-cuts in which s is connected to V− and some in which s is connected to V+. As stated in Theorem 4b, the min-cut recommendation system is the unique system which satisfies axiom 8 on undirected graphs. The proof of Theorem 4 is sketched in Section 5 and provided in full in Appendix C.

3.4 Personalized PageRank System

In designing a recommendation system, we can consider systems based on aggregating the level of trust a node has for other nodes. The idea is to define a trust level wuv for any two nodes u and v in the network, and to compute the recommendation for a nonvoter u by comparing two values:

W+(u) = Pv∈V+

wuv and W−(u) = Pv∈V− wuv.

A natural way to capture the level of trust in a network is to apply a personalized ranking system such as personalized PageRank (as introduced by Haveliwala in [13]). The personalized PageRank of node v for a node u is denoted by ppr_(u → v), and is defined to be the probability mass at node v in the stationary distribution of a biased random walk R_(u) with restart probability α ∈ (0, 1). The random walk R_(u) is defined as follows: at each step, with probability α, we restart the random walk at node u; and with probability 1 − α, we go uniformly to one of the outgoing neighbors, i.e, from a node w we go to one of the neighbors of w with probability if there are no outgoing neighbors, we restart the walk at node u . Note that, unlike the RW system in Section 3.1, this random walk continues regardless of whether the vertex it is currently visiting is a voter or nonvoter. The personalized PageRank values can be computed efficiently by simulating this random walk. For details of computation and extensions, see [13, 14]. Using this notation, the personalized PageRank recommendation system is as follows: Given a voting network G = (N, V+, V−,E), and a parameter α, we compute the personalized PageRank ppr_(u → v) for all pairs of nodes. For a nonvoter s, we computeW+(s) = Pv∈V+ ppr_(s → v), and W−(s) = Pv∈V− ppr_(s → v), and we set R(G, s) = sgn(W+ −W−). It is not hard to see that the personalized PageRank system satisfies the axioms symmetry, positive response, and ransitivity, but it does not satisfy the axioms IIS, and neighborhood consensus. The proofs of these statements are omitted due to space constraints.

4. AXIOMS:

We are now ready to consider various properties of recommendation systems as candidate axioms. These proper- ties are motivated by related literature on social choice and ranking systems, and also by the machinery used in practical recommendation systems. Similar to other axiomatic studies, the choice of axioms is to some extent arbitrary, and other sets of axioms are possible. Nevertheless, we believe that any one of our axioms by itself does capture a desirable property for recommendation systems, and that the study of the combination of these axioms leads to informative in-sights and interesting algorithms.

The first property, symmetry, is purely structural. The symmetry axiom contains two parts, anonymity and neutrality. Anonynmity means that the names of the agents do not matter for the source node; all that matters is the structure of the trust graph and the votes provided. Neutrality means that the values +/- are arbitrary.

Axiom 1. (Symmetry) Let G = (N, V+, V−,E) be a voting network. Anonymity: For any permutation π : N → N, let G′, be the isomorphic voting network under π. Then R+(G′) = π(R+(G)) and R−(G′) = π(R−(G)). Neutrality:

Also, let G′′ = (N, V−, V+,E) be the voting network where the sets of voters V− and V+ have been interchanged. Then R+(G) = R−(G′′) and R−(G) = R+(G′′).

The next axiom is a classic social choice axiom. It states that if a node s has recommendation 0 (or +) and an additional +-voter is added to the network along with an edge from s to the new node, then s’s new recommendation should be +. It reflects a razor’s-edge view of a 0 recommendation.

The axiom “pushes” the systems towards strict recommendations. (Without such an axiom, systems may almost always recommend 0.)

Axiom 2. (Positive response) Let w 6∈ N, s ∈ V ,

G = (N, V+,V−,E), and G′ = (N ∪ {w}, V+ ∪ {w}, V−,E ⊎{(s,w)}). If s /∈ R−(G) then s ∈ R+(G′).

Note that the above axiom is presented asymmetrically in terms of ± votes and recommendations. In combination with the Symmetry axiom, the corresponding version with − votes and recommendations follows directly. We use an asymmetric presentation for readability in several of the axioms. The next axiom, Independence of Irrelevant Stuff (IIS) captures the semantics of recommendation systems discussed in the introduction: a source node is viewed as consulting neighboring agents in the trust graph, who consult their neighbors etc., while agents who have formed opinions just vote according to their opinion. This means that when considering the recommendation for a particular source node in a particular trust graph, where part of the agents vote (perhaps based on first-hand experience), feedback from these agents is independent of who they trust (i.e., they trust themselves infinitely more than others) and the recommendation system should consider only reachable nodes and ignore links out of voters. While one may consider other types of semantics, something similar to this axiom appears in many previously designed systems.

Axiom 3. (IIS) Let G = (N, V+, V−,E) and e ∈ V × N be an edge leaving a voter. Then for the subgraph G′ = (N, V+, V−,E \ {e}) in which e has been removed, R(G) = R(G′). Similarly, if v ∈ N is a node not reachable from s ∈ V , then for the subgraph G′′ in which node v (and its associated edges) have been removed, R(G, s) = R(G′′, s). When we write R(G) = R(G′), as in the above, it means that the recommendations on the two voting networks are Identical.

The following requirement deals with some minimal ratio- nality we wish to attribute to the agents; as in the classical theory of choice we are willing to assume something about the vote of an agent who has no a priori opinion only in extreme cases. The neighborhood-consensus axiom does just that: if all the agents trusted by a node v vote +, and no other nodes touch v’s neighbors, then the recommendation of any other nonvoter u will remain unchanged if v instead becomes a + voter.

Axiom 4. (Neighborhood Consensus) Fix a voting network G = (N, V+, V−,E), and let s, u ∈ V be distinct nonvoters. Suppose u has at least one outgoing edge, and suppose that each outgoing edge (u, v) ∈ E points to v such that v ∈ V+ and v has no (incoming or outgoing) neighbors other than u. Let G′ = (N, V+ ∪ {u}, V−,E). Then R(G, s) = R(G′, s).

4.1 Transitivity

Transitivity is a central concept in the axiomatization of voting [6]. In our context, we consider the case where the underlying trust graph is fixed, but the system needs to deal with more than one item, and different subsets of nodes vote on different items. The definition of transitivity is based on the following binary relation, which describes whether a specified source node trusts a set of nodes A more than another set of nodes B.

Definition 3. Let G = (N, V+, V−,E) be a voting net- work. If s Є R+(G), then we say that s trusts V+ more than V− relative to multigraph (N,E). For each source node, a binary relation is generated. The transitivity axiom stipulates that each of these binary relations must be transitive.

Axiom 5. (Transitivity) For all multigraphs (N,E), s Є V , and disjoint A,B,C Є N, if, relative to (N,E), s trusts A more than B and s trusts B more than C, then s trusts A more than C.

4.2 Trust Propagation

In this section, we consider propagation of trust. Intuitively, if u trusts v and v trusts w, then u should trust w, at least to some extent. Much has been written about trust propagation within social networks (see, e.g., [12]) and the axiom below is a conservative interpretation that agrees with much of the literature. One would like to say that if u trusts nonvoter v, and v trusts w, then we can simply add an edge from u to w without changing anything. However, our system is supposed to reflect degrees of trust, and this would falsely inflate such trust. Instead, we count edges as follows. Suppose there are k edges leaving v (that don’t point to u). Suppose that there happen to be k edges from u to v. Then we can remove k edges from u to v and replace them by k new edges from u to the k nodes that v trusts (besides u), and no recommendations should change

Axiom 6. (Trust propagation) Let voting network G = (N, V+, V−,E), u 6= v Є V , and suppose that the edges leaving v (besides those to u) are (v,w1), . . . , (v,wk), (wi 6= u) for some integer k. Suppose that E contains exactly k copies of (u, v). Then, for E′ = `E Є {(u,w1), . . . , (u,wk)}´ \ {(u, v) Є k} and G′ = (N, V+, V−,E′), we have that R(G) = R(G′). Another natural axiom is scale invariance. Loosely speaking, this means that the amount of trust placed in a node is relative.

Axiom 7. (Scale invariance) In voting network G = (N, V+, V−,E), u Є V , and k ≥ 1. Let G′ = (N, V+, V−,E Є E′), where E′ is the multiset containing k copies of each of the edges leaving v. Then R(G) = R(G′). It states that we can duplicate all edges leaving a node an arbitrary number of times without changing recommendations.

4.3 Majority and Groupthink

The majority axiom states that a node’s recommendation should equal the majority of the recommendations and votes of its trusted neighbors. We say the sign of an edge is posi- tive, negative, or neutral if it points to a node with positive vote or recommendation, negative vote or recommendation, or 0 recommendation, respectively.

Axiom 8. (Majority) Let G = (N, V+, V−,E) be a voting network, and let let s Є V be any nonvoter. Then, the recommendation of s should be equal to the majority of the

signs of the edges leaving s.

We are using the strict notion of majority as defined in Section 2. This choice is somewhat arbitrary, though it fits well with the next axiom. Also note that one can further axiomatize majority itself, but we leave it as is for the sake of brevity. Unfortunately, the majority axiom by itself does not imply unique recommendations on cyclic graphs (think about a graph with two nonvoters that point to each other). Instead, we consider the following property. Groupthink refers to a social phenomenon in which an entire group of people arrive at a ridiculous conclusion through self-reinforcing interactions. The no-groupthink axiom rules this out and imposes strong semantics on the system. There are two parts to this axiom. First, we consider the case that an entire group of nonvoters are all recommended +. This strong position should be based on something external, since no member voted. The requirement is that, among the edges leaving the group, a majority must point to nodes with + votes or recommendations. Furthermore, if a majority of the edges leaving the group point to nodes with + votes or recommendations, then the group must contain at least one node with + recommendation. Since no-groupthink is inconsistent for general directed graphs, we define it for specific graphs. We say that a recommendation system R avoids groupthink for G if the following holds:

Axiom 9. (No groupthink) Let S Є V be a nonempty set of nonvoters. Let E′ be the multiset of edges in E from S to N \ S. (a) If S ⊆ R+(G), then a strict majority of the edges in E′ must point to nodes in R+(G)∪V+. (b) If a strict majority of the edges in E′ point to nodes in R+(G) ∪ V+,then S ∩ R+(G) ≠ ∅. source as f₂. Since we know f₂is true, we are pretty confident that f₃is true, somewhat confident for f₄, but not that confident with f₅, because each hop introduces uncertainty.

Theorem 2 shows that adding a neutral fact achieves the same effect as performing propagation decay in each iteration. Thus it is unnecessary to perform such decay when using the neutral fact. This is the second role of neutral fact (the first role is to guarantee the existence of a unique solution and convergence of the iterative algorithm), which is also very important to our approach.

5. EXPERIMENT RESULTS:

We test our approach SSTF on six real-world data sets, including the data set containing book authors used in [16] and [6], four data sets from HTML tables on the web for entities of certain types, and a huge data set containing hundreds of millions of entity attribute value triples extracted from HTML tables all over the web. Small-scale experiments are performed on a PC with Intel Quad-Core 2.66GHz CPU, 32GB memory. Web-scale experiments are performed on a PC cluster based on Dryad [12] that supports MapReduce. We also test the scalability of our approach on synthetic data sets at different scales.

5.1 Book Authors Data Set

The first experiment is based on a real-world data set containing authors of computer science books, which is the only real-world data set used in [16] and [6]. This data set is extracted from AbeBooks. com. Each book is listed on a set of online bookstores, each providing the authors of the book. The goal is to find the correct list of authors for each book. There is a ground truth set containing the authors of 100 randomly selected books, created by manually looking at the images of the book covers. This data set contains 1263 books and 24364 listings from 877 bookstores. Both [16] and [6] provide detailed experiment results on this data set, although using different evaluation criteria. We adopt the criteria of [6] so we can directly compare with its results. For each book, the author names are normalized into a list of names, where duplicate names are removed and middle names are ignored. The author names of a book are considered to be correct if and only if they exactly match with the ground truth after normalization. Cases such as additional, missing, misordered and misspelled names are all considered incorrect. We use the definition of similarity between two sets of authors from [16] 1 . The parameters of SSTF are set as follows. The weight of an edge between two facts from same data source is set to = 0.01. The weight of an edge from the neutral fact to each fact is 1. After each iteration, we compute the relative change of the confidence score vector as and the iterative procedure stops if the relative change is less than 0.01 after any iteration. We first test how fast SSTF converges and how its accuracy changes over iterations. Figure 4 shows the accuracy of our approach SSTF (Semi-supervised Truth Finder), as we vary the amount of training data from 12.5% to 87.5% of the 100 labeled examples. n-fold validation is used in each experiment. For example, 8-fold validation is used when using 12.5% of labeled examples as training data, which means 12 training examples are used in four folds and 13 used in the other four folds. We can see the accuracy improves over iterations most of the time.

Figure 5 shows the relative change in the confidence score vector after each iteration. We can see that SSTF converges with a steady and reasonably fast pace. Figure 6 shows the sensitivity of SSTF to different parameter values, where we vary the weight of an edge to/from the neutral fact from 0.1 to 5 and from 0.001 to 0.05. We can see that SSTF is not sensitive to changes in its parameters.

Parameters.

We compare our approach (with 75 training examples) to the following algorithms: (1) Voting, which considers the fact provided by most data sources as the true fact (a fact is randomly chosen in case of a tie); (2) TruthFinder as described in [16]; (3) Accu as described in [6]; (4) AccuWithSim as described in [6]; (5) 2-Estimates, which is described in [10] and has the highest accuracy among the methods in [10]. Because we use the same data set and evaluation criteria as in [6], we simply report their results of Accu and AccuWithSim. The other methods are implemented according to their papers. Table 2 shows the accuracy, number of iterations used, and total running time for each approach. (The running time of Accu and AccuWithSim are from [6], which may be using a less powerful computer, as the reported running time of TruthFinder in [6] is four times of that in our experiment.) Our approach, SSTF, achieves higher accuracy than existing approaches, especially when compared to TruthFinder, which does not detect data copying behaviors (and neither does SSTF). This experiment shows that SSTF can significantly improve accuracy with a small training set. It is also significantly

5.2 Web Data Sets

5.2.1 Data Collection

It has been observed that HTML tables on the web provide a huge number of facts, though they contain much noise [4]. We extract facts from tables and use our approach to distinguish true facts from false ones. As in [4], we focus on attribute-value tables, each of which contains one column of attribute names and another column of values, with two examples shown in Figure 7. Such tables widely exist on the web and usually provide popular facts for each entity, making them the best subjects for truth discovery.

The following method is used to extract facts from HTML tables. We build a table classifier using the approach from [2], and train it with a manually labeled set of attribute-value tables. With this classifier, we extract 744M attribute-value tables from 20B web pages in Bing’s index on 2010/06/22. However, as these attribute-value tables are not associated with any entity, we use the following method to find the main entity that each web page talks about. The main entity of a web page can often be found by matching user queries leading to a click on this page with the page content. For example, a user may search for “Britney Spears music” and click on http://www.last.fm/music/ Britney+Spears, whose title is “Britney Spears – Discover music, videos, concerts, & pictures at Last.fm”. We find the longest common substring between the query and the title, which is “Britney Spears”, the subject of this page. In addition, for each query and clicked web page, we try to match the query with the text in each <h1> element. If no match is found, it tries to match with each <h2> element, and so on, until it finds a match or has tried each element in the page. The matched part is considered as a candidate of the main entity of the page. For each candidate we build a wrapper based on HTML tag-paths [13]. For example, the wrapper for the above page from Last.fm is “<html><head><title>(*) – Discover music, videos, concerts, & pictures at Last.fm”. In order to select good wrappers and use them to extract entity names, we utilize the fact that many websites contain large num bers of web pages in the same format (e.g., all movie pages on IMDB). If we can build a wrapper for extracting the main entity from some pages, we can extract entities from other pages of same format. We use the approach in [17] to find sets of web pages in the same format. For each such set of pages, we choose the wrapper that extracts entities from the most pages that match the user queries. This wrapper will be used to extract the main entity from every page in that set

We use all query-click logs from the U.S. market between 2008/08/01 and 2009/05/31, which contains each search query and all URLs clicked for it. Based on these queries and the 20B web pages in Bing’s index on 2010/06/22, we extract the main entities from 93.3M pages with attribute-value tables, which have 164M tables in total. These entities are joined with the attribute values extracted from the tables on these pages. 749M entity attribute-value triples are created, which will be used in our experiment. We extract the data from Wikipedia page titles and infobox tables (e.g., the table in Figure 7 (a)) and use them as ground truth data. We want to remove all web sites getting majority of their data from Wikipedia, in order to perform a fair comparison between our approach and unsupervised approaches. For any web site with at least half of its data identical to some Wikipedia data, we consider it as a “Wikipedia copier” and remove it from our data set. Although this may falsely remove some websites, we can be sure that the remaining web sites are getting the majority of their data from sources other than Wikipedia.

5.2.2 Domain-Specific Experiments

We first test our approach on four data sets in special domains and compare it with existing approaches. The four data sets are directors of American films, developers (i.e., studios) of video games, governors of U.S. states, and presidents of universities. We collect the four sets of entities from special Wikipedia categories, as shown in Table 3, where all entities with disambiguation pages are removed (except U.S. states). Then we collect the values on the specified attribute from the HTML table data set, and create four fact sets as described in Table 4. Please note entities without the specific attributes provided by any site are ignored. SSTF uses the same parameters as in the book authors data set, and all other approaches are implemented according to their papers. The similarity function for directors of films is the same as that in the book authors data set. In the other three data sets the similarity function between two string values is defined as their edit-distance divided by the sum of their textual lengths (the similarity is scaled to [–1, 1] for SSTF). We ignore all ground truth facts that are not provided by any other web sites. 4-fold experiments are used for SSTF, with 75% of ground truth facts allocated as training data and 25% for testing in each fold. All other approaches use all ground truth facts for testing.

SSTF is compared to Voting, TruthFinder, AccuWithSim and 2-Estimates. Accuracy of an approach is defined as the percentage of entities for which the correct fact is selected. All algorithms converge on all data sets, except AccuWithSim which oscillates on three of the four data sets. When it oscillates among multiple states, we use the average accuracy of these states as its accuracy. Figure 8 shows the accuracies of each approach on these data sets. SSTF achieves accuracies between 78% to 96%, and it is significantly more accurate than the other approaches on all data sets except directors of films. On governors of U.S. states and presidents of universities it beats all other approaches by at least 10%. On directors of films the accuracies of different approaches are very close, with Vote and 2-Estimates being the most accurate and SSTF 0.4% behind. Figure 9 shows the running time of the five approaches on the four data sets.

5.2.3 Web-scale Truth Discovery

In this subsection we present an experiment that involves all

749M facts extracted from HTML tables as described in Section 6.2.1. A web site often provides facts about many attributes for many entities. If a web site provides correct values for an attribute on some entities (e.g., date of birth of some people), we can expect it to provide correct values on this attribute for other entities.

Therefore, we treat each web site and each attribute as a data source. Similarly we also treat each web site and each entity as a data source. We do not consider two facts with different entities and attributes to be from the same data source, cause a web site may have very different trustworthiness on different attributes or different entities. There are 65.7M entities, 749M facts (591M distinct facts) from 33K websites. According to our definition above, there are 89M data sources, where 15.5M of them are website-attribute pairs and 73.5M are website-entity pairs. Some data sources provide very large numbers of facts (as many as 5.2M), while on average each data source only provides 8.42 facts. The numbers of edges from different facts vary greatly. Thus we define the weight of an edge to/from the neutral fact according to Equation (7):

The edge weight between two facts from the same data source is set to 0.5, since facts from each data source are highly homogenous: They are either about the same entity or the same attribute. Again the facts from Wikipedia info box tables are used as ground truth. There are three runs of our approach:

One uses all ground truth facts for training (30.7M facts), one uses 10% of them, and the last uses 1% of them. Instead of continuing using Wikipedia to measure the accuracy of our approach, we measure that based on whether such data is useful in answering queries containing an entity and an attribute, which better reflects the usefulness of such data for web search. For example, if the user searches for {Barack Obama date of birth} and our data set contains some values for the entity “Barack Obama” and attribute “date of birth”, we will measure if the corresponding values are correct. First we need to create a test set such that facts appearing more often in user queries have a larger chance to be selected. Our fact set is tail-heavy where most facts, such as price of a product or version of some software, are uninteresting. In order to avoid selecting many trivial facts for evaluation, we only consider facts that appear in the Bing web search queries between 2008/08/01 and 2009/05/31. A fact is consider to be in the query set if there is a query consisting of the entity name and attribute name. Each fact is weighted by the frequency of the corresponding query (i.e., number of times the query is submitted to Bing). We randomly select 1000 facts with the probability each fact is selected proportional

Proportional to its weight, we exclude 20 facts that appear in the ground truth set. We also exclude 120 facts whose values are long textual contents such as the biography of people and symptoms of diseases, because it is hard to judge if such facts are correct or not. We manually label each fact as true or false according to the semantic meaning of the corresponding query using the following method. Given a fact which is an entity-attribute-value triple, we need to first decide the identity of its entity, as the entity name for some facts can be very ambiguous (e.g., “system” as the entity). We consider the entity with the specified name in the first result page of Bing as the true entity. Let us consider the fact “Brazil: Time zone = UTC -2 to -5” (i.e., entity is “Brazil”, attribute is “Time zone” and value is “UTC -2 to -5”). We consider the entity to be the country of Brazil which is the main entity in the first result page of Bing for query “Brazil time zone”, and thus this fact is true. The fact “System: Security = Basic” is false because the first result page of the query “system security” is not about the entity “system”. The second requirement for a fact to be true is that there exists an entity with the specified attribute and value, as some facts are simply wrong, such as “Sony: Camera = Canon 300D”. Among the 860 labeled facts, 68 are true and 792 are false. Please note that most of the false facts are simply not facts, such as “baby: games = 32”, “Comcast: email = enter missing info” and “myspace: comments = add”. They are extracted because the HTML table classifier has limited accuracy. We compare SSTF with TruthFinder [16]. We implement SSTF with MapReduce according to Section 5.2, and implement TruthFinder with MapReduce as well. The approach in [6] is not MapReduceable as it involves sorting. Therefore we do not include it in our evaluation.

6. CONCLUSIONS

As online sources often provide inaccurate and conflicting information, truth discovery has become an important research problem. Existing studies all employ unsupervised approaches, which are often ineffective as it is sometimes very difficult to distinguish between true and false facts using only the data itself. In this paper we propose a semi-supervised approach that finds true values with the help of a small amount of ground truth data. Unlike existing studies that only provide iterative algorithms, we derive the optimal solution and provide an efficient iterative algorithm that converges to it. Experiments show our method achieves higher accuracy than existing approaches and can be applied on very large data sets.

7. REFERENCES

[1] A. Altman and M. Tennenholtz. On the axiomatic foundations of ranking systems. In Proc. 19^th International Joint Conference on Artificial Intelligence, pages 917–922, 2005.

[2] A. Altman and M. Tennenholtz. Ranking systems: the pagerank axioms. In EC ’05: Proceedings of the 6^th ACM conference on Electronic commerce, pages 1–8, New York, NY, USA, 2005. ACM Press.

[3] A. Altman and M. Tennenholtz. An axiomatic approach to personalized ranking systems. In Proc. 20th International Joint Conference on Artificial Intelligence, 2006.

[4] A. Altman and M. Tennenholtz. Quantifying incentive

compatibility of ranking systems. In Proc. Of AAAI-06, 2006.

[5] A. Altman and M. Tennenholtz. Incentive compatible ranking systems. In Proceedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS-07), 2007.

[6] K. Arrow. Social Choice and Individual Values (2^nd Ed.). Yale University Press, 1963.

[7] P. Avesani, P. Massa, and R. Tiella. A trust-enhanced recommender system application: Moleskiing. In SAC ’05: Proceedings of the 2005 ACM symposium on Applied computing, pages 1589–1593, New York, NY, USA, 2005. ACM Press.

[8] Y. Bakos and C. N. Dellarocas. Cooperation without enforcement? a comparative analysis of litigation and online reputation as quality assurance mechanisms. MIT Sloan School of Management Working Paper No. 4295-03, 2003.

[9] A. Borodin, G. O. Roberts, J. S. Rosenthal, and P. Tsaparas. Link analysis ranking: algorithms, theory, and experiments. ACM Trans. Inter. Tech., 5(1):231–297, 2005.

[10] A. Cheng and E. Friedman. Sybilproof reputation mechanisms. In P2PECON ’05: Proceeding of the2005 ACM SIGCOMM workshop on Economics of peer-to-peer systems, pages 128–132, New York, NY, USA, 2005. ACM Press.

[11] R. Dash, S. Ramchurn, and N. Jennings. Trust-based mechanism design. In Proceedings of the Third International Joint Conference on Autonomous Agents and MultiAgent Systems, pages 748–755, 2004.

[12] R. Guha, R. Kumar, P. Raghavan, and A. Tomkins. Propagation of trust and distrust. In WWW ’04: Proceedings of the 13th international conference on World Wide Web, pages 403–412, 2004.

[13] T. Haveliwala. Topic-sensitive pagerank. In WWW’02: Proceedings of the 11th international conference on World Wide Web, pages 517–526, 2002.

[14] T. Haveliwala, S. Kamvar, and G. Jeh. An analytical comparison of approaches to personalizing pagerank,2003. Technical report, Stanford University.

[15] D. Houser and J. Wooders. Reputation in auctions: Theory and evidence from ebay. Working Paper, University of Arizona, 2000.

[16] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5):604–632, 1999.

[17] R. Levien. Attack Resistant Trust Metrics. PhD thesis, University of California, Berkeley, 2002.

[18] P. Massa and P. Avesani. Controversial users demand local trust metrics: An experimental study on epinions.com community. In Proc. of AAAI-05, pages 121–126, 2005.

[19] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report, Stanford University, 1998.

[20] Pennock D. M., Horvitz E., and Giles C. L. Social Choice Theory and Recommender Systems: Analysis of the Axiomatic Foundations of Collaborative Filtering. In Proceedings of the National Conference on Artificial Intelligence (AAAI-2000), pages 729–734. AAAI Press, 2000.

[21] P. Resnick and R. Zeckhauser. Trust among strangers in internet transactions: Empirical analysis of ebay’s reputation system. Working Paper for the NBER workshop on empirical studies of electronic commerce, 2001.

[22] P. Resnick, R. Zeckhauser, R. Friedman, and E. Kuwabara. Reputation systems. Communications of the ACM, 43(12):45–48, 2000.

[23] A. P. Singh, A. Gunawardana, C. Meek, and A. C. Surendran. Recommendations using absorbing random walks. Manuscript under submission.

[24] G. Slutzki and O. Volij. Ranking participants in generalized tournaments. International Journal of Game Theory, 33(2):255–270, 2005.

[25] M. Tennenholtz. Reputation systems: An axiomatic approach. In Proceedings of the 20th conference on uncertainty in Artificial Intelligence (UAI-04), 2004.

Received on 10.07.2011 Accepted on 18.11.2011

Int. J. Tech. 1(2): July-Dec. 2011; Page 76-86